Pandas Basics¶
In [26]:
import pandas as pd
print(pd.__version__)
1.3.1
Panda Series¶
A Pandas Series is like a column in a table.¶
It is a one-dimensional array holding data of any type.
In [13]:
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
0 1 1 7 2 2 dtype: int64
In [17]:
print(myvar[1])
7
- With the index argument, you can name your own labels.
In [14]:
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
x 1 y 7 z 2 dtype: int64
In [19]:
print(myvar["y"])
7
- You can also use a key/value object, like a dictionary, when creating a Series.
In [20]:
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
day1 420 day2 380 day3 390 dtype: int64
****The keys of the dictionary become the labels.*
In [21]:
print(myvar['day1'])
420
- To select only some of the items in the dictionary, use the index argument and specify only the items you want to include in the Series.
In [23]:
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories, index = ["day1", "day2"])
print(myvar)
day1 420 day2 380 dtype: int64
Pandas Data Frame¶
Data sets in Pandas are usually multi-dimensional tables, called DataFrames.¶
Series is like a column, a DataFrame is the whole table.
In [48]:
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
calories duration 0 420 50 1 380 40 2 390 45
Locate Row¶
Pandas use the loc attribute to return one or more specified row(s)
In [49]:
print(df.loc[0])
calories 420 duration 50 Name: 0, dtype: int64
****This example returns a Pandas Series.*
In [50]:
print(df.loc[[0, 1]])
calories duration 0 420 50 1 380 40
**** When using [ ], the result is a Pandas DataFrame.*
In [35]:
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
calories duration day1 420 50 day2 380 40 day3 390 45
In [36]:
print(df.loc["day2"])
calories 380 duration 40 Name: day2, dtype: int64
Load Files Into a DataFrame¶
In [51]:
df = pd.read_csv('pokemon_data.csv')
print(df)
# Name Type 1 Type 2 HP Attack Defense \
0 1 Bulbasaur Grass Poison 45 49 49
1 2 Ivysaur Grass Poison 60 62 63
2 3 Venusaur Grass Poison 80 82 83
3 3 VenusaurMega Venusaur Grass Poison 80 100 123
4 4 Charmander Fire NaN 39 52 43
.. ... ... ... ... .. ... ...
795 719 Diancie Rock Fairy 50 100 150
796 719 DiancieMega Diancie Rock Fairy 50 160 110
797 720 HoopaHoopa Confined Psychic Ghost 80 110 60
798 720 HoopaHoopa Unbound Psychic Dark 80 160 60
799 721 Volcanion Fire Water 80 110 120
Sp. Atk Sp. Def Speed Generation Legendary
0 65 65 45 1 False
1 80 80 60 1 False
2 100 100 80 1 False
3 122 120 80 1 False
4 60 50 65 1 False
.. ... ... ... ... ...
795 100 150 50 6 True
796 160 110 110 6 True
797 150 130 70 6 True
798 170 130 80 6 True
799 130 90 70 6 True
[800 rows x 12 columns]